NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

On the Complexity of Teaching a Family of Linear Behavior Cloning Learners

Bharti, Shubham; Wright, Stephen; Singla, Adish; Zhu, Xiaojin (December 2024, Neural Information Processing Systems)

Full Text Available
Optimally Teaching a Linear Behavior Cloning Agent

Bharti, Shubham; Wright, Stephen; Singla, Adish; Zhu, Xiaojin (December 2024, NeurIPS 2024)

Full Text Available
Inception: Efficiently Computable Misinformation Attacks on Markov Games

McMahan, Jeremy; Wu, Young; Chen, Yudong; Zhu, Xiaojin; Xie, Qiaomin (August 2024, Reinforcement Learning Conference (RLC), 2024.)

Full Text Available
Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value

Wu, Young; McMahan, Jeremy; Chen, Yiding; Chen, Yudong; Zhu, Xiaojin; Xie, Qiaomin (June 2024, Proceedings of the 41 st International Conference on Machine Learning)

Full Text Available
Optimal Attack and Defense for Reinforcement Learning

https://doi.org/10.1609/aaai.v38i13.29346

McMahan, Jeremy; Wu, Young; Zhu, Xiaojin; Xie, Qiaomin (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

To ensure the usefulness of Reinforcement Learning (RL) in real systems, it is crucial to ensure they are robust to noise and adversarial attacks. In adversarial RL, an external attacker has the power to manipulate the victim agent's interaction with the environment. We study the full class of online manipulation attacks, which include (i) state attacks, (ii) observation attacks (which are a generalization of perceived-state attacks), (iii) action attacks, and (iv) reward attacks. We show the attacker's problem of designing a stealthy attack that maximizes its own expected reward, which often corresponds to minimizing the victim's value, is captured by a Markov Decision Process (MDP) that we call a meta-MDP since it is not the true environment but a higher level environment induced by the attacked interaction. We show that the attacker can derive optimal attacks by planning in polynomial time or learning with polynomial sample complexity using standard RL techniques. We argue that the optimal defense policy for the victim can be computed as the solution to a stochastic Stackelberg game, which can be further simplified into a partially-observable turn-based stochastic game (POTBSG). Neither the attacker nor the victim would benefit from deviating from their respective optimal policies, thus such solutions are truly robust. Although the defense problem is NP-hard, we show that optimal Markovian defenses can be computed (learned) in polynomial time (sample complexity) in many scenarios.
more » « less
Full Text Available
Exact Policy Recovery in Offline RL with Both Heavy-Tailed Rewards and Data Corruption

https://doi.org/10.1609/aaai.v38i10.29022

Chen, Yiding; Zhang, Xuezhou; Xie, Qiaomin; Zhu, Xiaojin (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

We study offline reinforcement learning (RL) with heavy-tailed reward distribution and data corruption: (i) Moving beyond subGaussian reward distribution, we allow the rewards to have infinite variances; (ii) We allow corruptions where an attacker can arbitrarily modify a small fraction of the rewards and transitions in the dataset. We first derive a sufficient optimality condition for generalized Pessimistic Value Iteration (PEVI), which allows various estimators with proper confidence bounds and can be applied to multiple learning settings. In order to handle the data corruption and heavy-tailed reward setting, we prove that the trimmed-mean estimation achieves the minimax optimal error rate for robust mean estimation under heavy-tailed distributions. In the PEVI algorithm, we plug in the trimmed mean estimation and the confidence bound to solve the robust offline RL problem. Standard analysis reveals that data corruption induces a bias term in the suboptimality gap, which gives the false impression that any data corruption prevents optimal policy learning. By using the optimality condition for the generalized PEVI, we show that as long as the bias term is less than the ``action gap'', the policy returned by PEVI achieves the optimal value given sufficient data.
more » « less
Full Text Available
Data Poisoning to Fake a Nash Equilibria for Markov Games

https://doi.org/10.1609/aaai.v38i14.29529

Wu, Young; McMahan, Jeremy; Zhu, Xiaojin; Xie, Qiaomin (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

We characterize offline data poisoning attacks on Multi-Agent Reinforcement Learning (MARL), where an attacker may change a data set in an attempt to install a (potentially fictitious) unique Markov-perfect Nash equilibrium for a two-player zero-sum Markov game. We propose the unique Nash set, namely the set of games, specified by their Q functions, with a specific joint policy being the unique Nash equilibrium. The unique Nash set is central to poisoning attacks because the attack is successful if and only if data poisoning pushes all plausible games inside it. The unique Nash set generalizes the reward polytope commonly used in inverse reinforcement learning to MARL. For zero-sum Markov games, both the inverse Nash set and the set of plausible games induced by data are polytopes in the Q function space. We exhibit a linear program to efficiently compute the optimal poisoning attack. Our work sheds light on the structure of data poisoning attacks on offline MARL, a necessary step before one can design more robust MARL algorithms.
more » « less
Full Text Available
Reward Poisoning Attacks on Offline Multi-Agent Reinforcement Learning

https://doi.org/10.1609/aaai.v37i9.26240

Wu, Young; McMahan, Jeremy; Zhu, Xiaojin; Xie, Qiaomin (June 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

In offline multi-agent reinforcement learning (MARL), agents estimate policies from a given dataset. We study reward-poisoning attacks in this setting where an exogenous attacker modifies the rewards in the dataset before the agents see the dataset. The attacker wants to guide each agent into a nefarious target policy while minimizing the Lp norm of the reward modification. Unlike attacks on single-agent RL, we show that the attacker can install the target policy as a Markov Perfect Dominant Strategy Equilibrium (MPDSE), which rational agents are guaranteed to follow. This attack can be significantly cheaper than separate single-agent attacks. We show that the attack works on various MARL agents including uncertainty-aware learners, and we exhibit linear programs to efficiently solve the attack problem. We also study the relationship between the structure of the datasets and the minimal attack cost. Our work paves the way for studying defense in offline MARL.
more » « less
Full Text Available
Game redesign in no-regret game playing

https://doi.org/10.24963/ijcai.2022/461

Ma, Yuzhe; Wu, Young; Zhu, Xiaojin (January 2022, The 31st International Joint Conference on Artificial Intelligence and the 25th European Conference on Artificial Intelligence)

Full Text Available
Corruption-robust offline reinforcement learning

Zhang, Xuezhou; Chen, Yiding; Zhu, Xiaojin; Sun, Wen (January 2022, The 25th International Conference on Artificial Intelligence and Statistics)

Full Text Available

« Prev Next »

Search for: All records